All Databases MacTech Vol 09-1993

Read Assembly

Volume Number: 9

Issue Number: 5

Column Tag: Assembly Workshop

The Secrets of the Machine

Or, how to read Assembler

By Malcolm H. Teas, Rye, New Hampshire

About the author

Malcolm H. Teas, 556 Long John Road, Rye, NH 03870 Internet:

mhteas@well.sf.ca.us

AppleLink: mhteas@well.sf.ca.us@INTERNET#

America Online: mhteas

Why Read Assembler?

When we write programs in C or Pascal, what we’re really doing is writing in

the computer’s second language. When I studied French I was always translating it into

english in my head to understand it. Well, that’s just what the computer’s doing. It’s

taking C, Pascal, or whatever you’re programming with and translating it to its native

language - assembler. Translation is what a compiler’s job is. But just like my

french translations would lead to errors and awkward speech, the compiler can

occasionally make mistakes and the code that the compiler creates from your source is

a little awkward too - it isn’t always the most efficient. Sometimes this doesn’t matter

that much since the CPU is quite fast. However, if you’re doing a time critical

algorithm, the speed of your application just isn’t what you want it to be, or you

suspect that there’s some strange error, then it’s time to talk to the machine in its

native language. This article is a traveller’s phrasebook.

Where do I find assembler?

Although this wasn’t always so, these days you can now find some way to examine

either the translated assembler code for your source or your disassembled application.

If you’re using the latest version of Think C (version 5.0), then try the

“Disassemble” item in the “Source” menu. This generates the translated assembler

version of the source in the front window. The new window that results shows the

assembler and can be printed, saved and otherwise treated as any other Think C editor

window.

MPW (Macintosh Programmer’s Workshop) also offers a number of tools to get

at your assembler listings. The dumpCode tool takes any type of code resource and

disassembles it. It also can list the jump table and other information that is included.

I’ll talk about jump tables when I cover the memory map of an application. If you use

the SourceBug debugger, it has an option to view source as either the original language

or as assembler.

ResEdit now has an external that, when you open a code resource, disassembles it.

It is quite helpful in finding the targets of jumps and other memory addresses, it shows

you graphically with arrows. Unfortunately, it doesn’t permit editing of the

assembler. While this external is not officially supported by Apple, it has worked

well for me.

If you cannot get any of these, you can always use a low-level debugger like

MacsBug or TMON. By getting into the debugger while in your application, you can

disassemble the code you’re interested in and save it to a file. In MacsBug, you’d use

the “ip” command to disassembler around the program counter, then use the “log”

command to save the screen to a file.

What the computer really looks like.

When you read assembler, you see instructions that refer to registers, memory

locations, and have an unusual syntax. The programming environment for assembler

is the bare machine so it has some constraints. To learn to read assembler, you need to

know something about the environment. Actually, this is the hard part, reading the

assembler instructions is easy.

First are the registers. The CPUs (a computer’s central processing unit) that are

used these days have a number of registers to hold data or addresses currently being

used by the program. The Motorola chips used in the Macintosh (the 680x0 family)

have eight data registers, eight address registers, a program counter (also called a

PC), and a condition code or status register. The 68020 and later chips have some

other specialized registers used by the Mac’s operating system for handling

interrupts, mapping memory, and managing the CPU’s cache. However, these are only

used in the operating system and are not interesting to the application programmer.

Data registers are used more often by instructions that manipulate data like the

logical and arithmetic instructions. Address registers are used to address locations in

the computer’s memory. They’re often used to index data and can be used in “move”

instructions to help calculate the memory location of data. Address register seven

(A7) is used by the CPU as a stack pointer. Some instructions can address data on the

stack and automatically push or pop the stack. The Mac operating system has a

convention to use address register five (A5) as the pointer to the top of an

application’s global data and to use A6 (address register six) as the stack frame

pointer. I’ll cover more of the stack frame and global data space later.

The program counter (PC) is a special register that holds the address of the next

instruction to execute. The status register (SR or CCR for Condition Code Register)

holds flags showing the results of the last data operation: zero, negative, positive, etc.

These are used in all branching instructions that implement the “if” statements,

loops, and multi-way ifs like the C “switch” statement.

The memory of the Mac, to an assembler language programmer, just looks like a

big array. Some of this array holds the program, some holds the system, some holds

the application’s data, and some other is used by other applications. This explains why

one application’s bugs can cause problems for other applications. The first

application can overwrite the contents of memory anywhere so that data or code for

another application can be damaged too. As a result, keeping track of pointers and

handles is quite important.

But for an application, the Mac’s memory is organized into application memory

areas which hold the heap, stack, and global data. Any application is expected to stay in

its own area. Low memory belongs to the interrupt table and system globals. Above

that is the system heap, followed by the multifinder area. In high memory are the

address locations of the cards and I/O devices that the Mac is equipped with. The Mac

operating system divides the MultiFinder area into application memory areas or

partitions, one for each application in memory at the time. The size of an application’s

partition is determined when an application is launched from the ‘SIZE’ resource. If

there is no ‘SIZE’ resource, a default partition size of 512K bytes is used. The

partition size can be changed by the user in the Finder’s “Get Info” box. This creates

a new ‘SIZE’ resource. (See Inside Mac VI page 5-14 for more information on the

‘SIZE’ resource.)

This application memory partition is, in turn, subdivided into the application’s

heap, stack, and global area. Your application’s code, opened resources, handle and

pointer blocks are all in the heap which occupies the bottom part of the partition and

may grow upward. The jump table, global variables, and the quickdraw application

globals are all stored in the global area at the top of the partition. Register A5 (by

convention) points into this area, at the top of the application’s globals. When a

routine references global data, it’s done as a negative offset from register A5. Due to

how this addressing mode is coded in instructions, this makes the maximum size of the

application globals 32K. Although some compilers have ways around this limit, it’s

best to stay under it the larger global areas are more difficult to access and make your

program less efficient. Parameters for routines and local variables are stored on the

stack which grows downward in memory and is located just beneath the QuickDraw

application globals.

The jump table is fixed in memory for the life of the application and so is used to

get around the 32K limit on ‘CODE’ resources and to allow them to be moved in

memory. When a routine is called that isn’t in the same code resource as the calling

routine, the compiler & linker make a jump table entry. This is a jump instruction to

the other code resource. So, the calling routine does a JSR (Jump to Subroutine) to

the address in the jump table, and the jump table the jumps control to the location in

the new code resource. When a code resource is moved in memory, the jump table is

corrected. This also allows the Segment Loader manager (part of the Mac Toolbox) to

Referenced by (3):